PRESEE: An MDL/MML Algorithm to Time-Series Stream Segmenting
نویسندگان
چکیده
Time-series stream is one of the most common data types in data mining field. It is prevalent in fields such as stock market, ecology, and medical care. Segmentation is a key step to accelerate the processing speed of time-series stream mining. Previous algorithms for segmenting mainly focused on the issue of ameliorating precision instead of paying much attention to the efficiency. Moreover, the performance of these algorithms depends heavily on parameters, which are hard for the users to set. In this paper, we propose PRESEE (parameter-free, real-time, and scalable time-series stream segmenting algorithm), which greatly improves the efficiency of time-series stream segmenting. PRESEE is based on both MDL (minimum description length) and MML (minimum message length) methods, which could segment the data automatically. To evaluate the performance of PRESEE, we conduct several experiments on time-series streams of different types and compare it with the state-of-art algorithm. The empirical results show that PRESEE is very efficient for real-time stream datasets by improving segmenting speed nearly ten times. The novelty of this algorithm is further demonstrated by the application of PRESEE in segmenting real-time stream datasets from ChinaFLUX sensor networks data stream.
منابع مشابه
Minimum Message Length Segmentation
The segmentation problem arises in many applications in data mining, A.I. and statistics, including segmenting time series, decision tree algorithms and image processing. In this paper, we consider a range of criteria which may be applied to determine if some data should be segmented into two or regions. We develop a information theoretic criterion (MML) for the segmentation of univariate data ...
متن کاملThe MML Evolution of Classi cation
Minimum encoding induction (MML and MDL) is well developed theoretically and is currently being employed in two central areas of investigation in machine learning|namely, classiication learning and the learning of causal networks. MML and MDL ooer important tools for the evaluation of models, but ooer little direct help in the problem of how to conduct the search through the model space. Here w...
متن کاملAlgorithms for Segmenting Time Series
As with most computer science problems, representation of the data is the key to ecient and eective solutions. Piecewise linear representation has been used for the representation of the data. This representation has been used by various researchers to support clustering, classication, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain...
متن کاملMessage Length Estimators, Probabilistic Sampling and Optimal Prediction
The Rissanen (MDL) and Wallace (MML) formulations of learning by compact encoding only provide a decision criterion to choose between two or more models, they do not provide any guidance on how to search through the model space. Typically, deterministic search techniques such as the expectation maximization (EM) algorithm have been used extensively with the MML/MDL principles to find the single...
متن کاملMinimum Message Length Grouping of Ordered Data
Explicit segmentation is the partitioning of data into homogeneous regions by specifying cut-points. W. D. Fisher (1958) gave an early example of explicit segmentation based on the minimisation of squared error. Fisher called this the grouping problem and came up with a polynomial time Dynamic Programming Algorithm (DPA). Oliver, Baxter and colleagues (1996,1997,1998) have applied the informati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
دوره 2013 شماره
صفحات -
تاریخ انتشار 2013